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1. Introduction 

Case-cohort designs, originally proposed by Prentice (1986) for right-censored 
survival data, are very useful in large epidemiologic cohort studies, and their 
applications are increasingly common in biomedical research. In a case-cohort 
study, complete data are only obtained for all failures observed during follow- 
up and for a sub-sample, called the subcohort, of the entire cohort. The sub- 
cohort can be a simple random or stratified sub-sample. Such a design is cost- 
effective for studies of rare events, and has been extended to other models includ- 
ing the additive hazards model (Kulich and Lin , 2000), transformation models 
(Chen and Zucker , 2009; Kong et al , 2004; Lu and Tsiatis , 2006), and the ac- 
celerated failure time model (Nan, Kalbfleisch, and Yu , 2009; Nan, Yu, and Kalbflcisch , 
2006), and also to other censoring mechanisms (Li, Gilbert, and Nan , 2008; 
Li and Nan , 2011), among many others. 

For right-censored data, the pseudo likelihood approach of Self and Prentice 
(1988) constructs risk sets from subcohort only, thus the counting process mar- 
tingale theory is naturally applicable for deriving the asymptotic properties for 
the Cox-type regression models. This same strategy can be applied to some 
other regression models for right-censored data, for example, the accelerated 
failure time model studied by Nan, Yu, and Kalbfleisch (2006). Since complete 
information is also observed for all the failures, constructing risk sets from 
all observed data including failures outside the subcohort would yield more 
efficient estimation. This has been observed by many authors, for example, 
Borgan, Langholz, Samuelsen, Goldstein, and Pogoda (2000); Chen and Lo (1999); 
Chen and Zucker (2009); Kalbfleisch and Lawless (1988); KuUch and Lin (2000, 
2004); Nan, Kalbfleisch, and Yu (2009). The development of corresponding asymp- 
totic theories has been primarily based on calculations of counting process 
stochastic integrals. Such a method, however, lacks theoretical justification be- 
cause the integrands of those stochastic integrals are not predicable, not even 
adapted with respect to any filtration generated from the history. 

To overcome this technical hurdle, we consider a general semiparamctric Z- 
estimation method for bundled parameters using empirical process theory, see 
e.g. van der Vaart and Wellner (1996, 2007). Our approach does not use the 
stochastic integral formulation, thus there is no predictability requirement. The 
main body of the article is as follows. In Section 2, we introduce a general 
asymptotic theory for semiparamctric Z-estimation with bundled parameters. 
We then apply the Z-estimation theory to prove the asymptotic properties for 
case-cohort studies in Section 3. Both the Cox model and the additive haz- 
ards model with time-dependent covariatcs will be considered. We make some 
concluding remarks in Section 4. 

2. Semiparametric Z-estimation for bundled parameters 

Let ^? e 6 C K'^ be the parameter of interest, and rj : X x Q ^ be infinite 
dimensional nuisance parameter(s) in a Banach space T-L = {{x,9) n> ri{x,9) G 
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R'^ : X (z X^O ^ Q}. Such a parametrization allows the nuisance parameter to 
be a function of the parameter of interest, thus the two types of parameters are 
bundled together, a terminology originally used by Huang and Wellner (1997) 
and further studied by, for example. Ding and Nan (2011). Denote the random 
map X" Mf^ with n observations Xi, . . . , Xn as 



which becomes an estimating function for 9 when t] is given or replaced by 
its estimator. For independent and identically distributed (i.i.d.) observations 
Xi,. . . , Xn, very often ^„(6', ij) takes the following form: 



where ip{9,r]) = ip{X;9,ri{-;9)) is a random map X ^ M."^ with a single 
observation X. 

Here we use the term "nuisance parameter" in a rather loose sense. It does 
not need to be an actual parameter (for example, the baseline hazard function in 
the Cox model) in the original parametrization of the distribution of X. Broadly 
speaking, it is an unknown quantity in the estimating function in addition to 
the parameter of interest. The unknown quantity as a function of 9 needs to be 
estimated prior to estimating 9. We call the solution to ^n{0, fln{-',d)) = the Z- 
estimator for 9, where ?)„ is some estimator for r]. This type of generalization has 
been considered in the econometrics literature; see for example, Newcy (1994); 
Chen, Linton, and Van Keilegom (2003). We provide slightly modified results 
of Chen, Linton, and Van Keilegom (2003) with a focus on Z-estimation in the 
following lemmas, which we will use for the estimates in case-cohort studies we 
consider in this article. Proofs of the lemmas are provided in the Appendix. 

Let 00 denote the true value of 9 and 770 be the true functional form of r\. Let 
^{9,r]) be a deterministic function, which usually denotes the limit of 'i>n{9,ri) 
as n 00. We use p* to denote "in outer probability", and refer its definition 
and detailed discussion to van der Vaart and Wellner (1996). Note that all the 
lemmas in this section do not require i.i.d. data, though data in the case-cohort 
studies we consider are assumed to be i.i.d. Let | ■ | be the Euclidian norm. 
Let II • II be the supremum of a norm or semi- norm taking over all S 0, 
that is \\ri\\ = swpg^Q p{ri{-;9)) for some norm or semi-norm p; for example, 
Pivi-;^)) = sup^^x \v{x;0)\, which gives ||77|| = sup^ge sup^GA- \v{x;9)\. 

Lemma 2.1. ( Consistency.) Suppose 9o is the unique solution to "^{9, ?7o(s ^)) = 
in the parameter space Q and i)n is an estimator 0/770 such that ||?7„ — ?7o|| = 



*«(e;?7) 



^n{Xi,...,Xn;9,ii-,9)), 



(2.1) 




(2.2) 



Op*(l)- If 



sup 

eGe,||i7-i7o 



|vI/„(0,77(.;0))-vI/(0,^o(-;e))l 



(2.3) 



<,,^ l + |vl/„(0,77(.;0))| + |vl/(0,,7o(-;^))| 



— Up 



for every sequence {Sn} i 0, then 9n satisfying ^ n{Qmf]n{'',Qn)) 
verges in outer probability to 9o . 



Op*(l) con- 
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Since consistency is a global property, so our main condition, equation (2.3), 
is therefore necessarily global, that is, the supremum is taken over all of G. The 
p* in equation (2.3) indicates that the left-hand side converges to in outer 
probability in case that the term on the left is not Borel measurable. It is a 
stronger condition to require that the convergence holds when the denominator 
is replaced by 1. The purpose of adding an extra term in the denominator is to 
control the numerator when it blows up to infinity for some £ O. 

Lemma 2.2. (Rate of convergence and asymptotic representation.) Let Hq = 
\r\(x;ff) : X Cz X,9 E Oq} be a collection of functions that are continuously 
differentiable in 9 for all x E X with hounded derivative matrices {/)(•; 0)}, where 
&o C is a neighborhood of Oq. Suppose that On satisfying ^niOn,fjni'',On)) = 
Op. (ti^^/^) is a consistent estimator of 9o that is the unique solution to the 
equation ^(0, r/o(-; ^?)) = in &, and that fjn G Tio is an estimator of rjQ £ T-Lq 
satisfying ||7)„ — 7^0 II = Op*{n~^) for some /3 > 0. Suppose the following four 
conditions are satisfied: 



(i) (Stochastic equicontinuity.) 

|nV^(^„ - ^)(^„,7?„(-;^„)) - nV^(^„ - ^)(go,7yo(.;go))| 

1 + ny^\^n{L,f]n{-;en))\ + n^/^\^{9n,fln{-;9n))\ 



= Op.(l) . 



(U)n^/^^n{Oo,rio{-,Oo))^Op.{l). 

(Hi) (Smoothness.) (a) If (3 ^ 1/2, the function *(6', 7?(-; 6*)) : 60 x ^0 R'' 
is Frechet differentiable at {9Q,rjQ{-]9Q)), i.e., there exists a continuous d x d 
matrix ^i(0Oi %(s ^0)) o.'i^d a continuous linear functional ^'2(^0; %(■; ^0)) such 
that 

\^i9,7^i-,9))-^i9o,Voi-;Oo)) 

- {^i{0o, vo{-; Oo)) + ^2(^0, vo{-; 0o))M-, doMO 

- i'2{0o,m{-;0o)m~m){-;0o)]\ 

= o{\9-9o\)+o{\\T^-r^o\\); 
or (b) if < 13 < 1/2, for some a > 1 satisfying ajS > 1/2 we have 
\^{9,T^{--9))-^{9o,m{-,Oo)) 

- {*i(^o, ^o(-; ^0)) + ^2(^0, ?7o(-; eo))b7o(-; Oq)]^ - 9^) 

- *2(0o,%(-;0o))[(?? -%)(•; ^o)]| 

= o(|0-^^o|) + O(||r;-ryo|r). (2.5) 

Here the subscripts 1 and 2 correspond to the first and the second arguments in 
\['(-,-), respectively, and we assume that the matrix 




A = -■^i{9o, mi-, 0o)) - *2(^o, ^o(-; eo))[?7o(-; ^o)] 
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is nonsingular. 

(iv) n^'^i>2{0o,m{-.0o)mn~m){-.0o)] = Op*(l)- 

Then 9^ is v}^"^ -consistent and further we have 

+ ^2(^o,%(-;^o))[Wn-%)(-;eo)]} + Op-(i). (2.6) 

Remark: For i.i.d. data, Condition (i) in Lemma 2.2 holds if the class of 
functions {^{9,vi) : \9 — 9q\ < 5, \\t] — 770II < 5} is Donsker for some (5 > and 
satisfies Eq\tP{9 ,ri] X) - '4){9o,riQ] X)\'^ ^ as \9-9q\ and ||?7-77ol| ^ (see 
e.g. Corollary 2.3.12 of van der Vaart and Wellner (1996), page 115). Though 
simpler, this is stronger than Condition (i). Condition (ii) holds automatically 
for i.i.d. data if i?o|V'(^'o, ^yo)!^ < 00 and takes the form in (2.2). In Con- 
dition (iii), {i>i{9o,Vo{--,9o)) + ^2{eo,r)o{--,9o))[ho{--,9o)]}{9 - 9^) is obtained by 
the chain rule, which is the usual inner product of a d x d matrix and a d x 1 
vector; whereas ^'2(6*0, %(•; ^'o))[(?7 -%)(•; 6'o)] = E/=i ^2, (6*0, ?7o(-; 6'o))[('7i " 
^oj)(s ^o)]i here J is the number of infinite dimensional parameters contained in 
?7, is the sum of separate terms with each being a bounded linear func- 
tional that brings rj — rjQ to a real number, where 77 is close to rjQ in n^- 
rate for some /3 > 0. Note that equation (2.5) is indeed a stronger condi- 
tion than equation (2.4). Proposition 1 of Bickel, Klaassen, Ritov, and Wellner 
(1993), page 455, provides useful tools for checking Frechet differentiability for 
infinite-dimensional parameters. Condition (iv) holds automatically under (iii) 
if fjn is ri^/'^-consistent, but may require extensive work for slower than root-n 
convergence rate, see e.g. Wong and Severini (1991) and Huang and Wellner 
(1995). In view of the structure of equation (2.6), the asymptotic distribution of 
n^^^i(^n — &o) is determined by the asymptotic joint distribution of the random 
variables ^^/^(^^ _ *)(6'o, ??o(-; 6*0)) and ni/2^'2(6'o, 77o(-; 6'o))[(^n - ?7o)(-; 6*0)], 
particularly if the asymptotic joint distribution is multivariate Gaussian. 

In the case that 77 is free of 9, we have 77 = 0. Then Lemma 2.2 reduces to the 
following corollary that was studied by Hu (1998). The corollary is particularly 
useful for the case-cohort additive hazards model in the next section. Now we 
replace ^'i by ^'g and ^2 by ^E*^ without causing any confusion, and the notation 
II ■ II becomes a norm. 

Corollary 2.1. (Rate of convergence and asymptotic representation.) Suppose 
that 9n satisfying ^ nifimfln) = Op» is a consistent estimator of 9o that 

is the unique solution to ^(0, 770) = m O, and that fin is an estimator 0/770 
satisfying \\fin ~ VoW — Op*{n^^) for some (3 > 0. Suppose the following four 
conditions are satisfied: 
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1 + ni/2|^-„(0„, ^„)| + „i/2|*(0„,^„)| 



Op.(l) 



(it) n'/^^„{eo,m)^ Op.il). 

(Hi) (Smoothness.) (a) If p = 1/2, Junction "^[9,7]) is Frechet differ entiahle 
at (6*0, rjo), i.e., there exists a continuous and nonsingular dxd matrix 'ig{9o, rjo) 
and a continuous linear functional 5',;(0Oi'?o) such that 

ri) ~ ^-(00, Vo) - - ^^o) - *r,(eo, - Vo]\ (2.7) 
^ oi\d-eo\) + o{\\r^-fjo\\); 

or (b) if < (3 < 1/2, for some a> 1 satisfying a/3 > 1/2 we have 

-^'(0o,?7o) - ^e{O-0o) - 'ir,iOo,Vo)[v-Vo]\ (2.8) 
= o{\9-0o\) + O{\\v-Vo\n- 

(iv) n^'^-^^{eo,m)[nn - m] = Op-(i)- 

T/ien ^„ is n}/"^ -consistent and further we have 

= {-*(,(eo,r/o)} '"'/'{(*« -*)(^o,r?o) + *^(^^o,??o)[^y> 

+ Op.(l). 

3. Case-Cohort Studies 

We consider two models that are used for analyzing case-cohort data: the Cox 
model and the additive hazards model. Let X be the generic random vari- 
able that consists of several random variables. Let T be the failure time and 
C the censoring time, we only observe Y = min(T, C) and the failure indi- 
cator A = 1(T < C). Let Z{-) be the d-dimensional covariate process and 
Z{t) be the covariate history up to time t. We assume that for all t, events 
{T > t} and {C > t} arc conditionally independent given Z{t), and both 
are independent of {Z{s) : s > t}. In other words, Z{-) is an external covari- 
ate, see Kalbflcisch and Prentice (2002). Suppose potentially we would have n 
i.i.d. copies of {Y, A, Z{Y)) in the full cohort, but we only observe Z{Y) for 
all failures and subjects in the subcohort that is a sub-sample of the entire 
cohort. The subcohort may be selected using a variety of sampling schemes 
including the simple random sampling and the stratified sampling based on 
some auxiliary variable Z*{-) that can be a subset of Z{-), may or may not be 



(2.9) 
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time-dcpcndcnt, and is available to everyone in the cohort. We focus on the 
independent Bernoulli sampling method for selecting the subcohort by which 
a coin is flipped for each subject i in the cohort with a given success proba- 
bility TTi that may depend on Z*. For finite population sampling methods, as 
applied in Breslow and Wellner (2007), we expect the weighted bootstrap em- 
pirical process theory of Praestgaard and Wellner (1993) to be a useful tool to 
verify conditions in Lemmas 2.1 and 2.2. See Saegusa and Wellner (2012) for a 
related problem using the weighted bootstrap empirical process theory. 

Let Ri be the subcohort indicator that equals 1 if the ith subject is selected 
into the subcohort and otherwise. Then iZi = P{Ri = 1|^*). Thus the observed 
data in such a case-cohort study are i.i.d. and the missing data mechanism is 
missing at random (Little and Rubin , 2002). The following is a set of common 
regularity conditions for both models. 

Assumption (A): The sample paths of Z(-) e Z are bounded with bounded 
variation, and the parameter space is compact. 

Assumption (B): The conditional distribution of T given Z(-) possesses a 
continuous Lebcsguc density. 

Assumption (C): The study stops at a finite time r > such that, for 
constants ui and (T2, infzgz P(C > t\Z{t) ~ z(t)) = (Ti > and mizez P{T > 
t\Z {t) = z{t)) = <J2 e (0,1). 

Assumption (D): The map ^{9,ri{-]6)) = Pip {9, 7] {■■,9)) is Frcchct differen- 
tiable at {Oq, 77o(-; ^o)) with a nonsingular partial derivative with respect to 9 at 

(0o,%(-;^o)). 

Assumption (E): In case-cohort studies, data are missing at random with 
TTi > > for all i and a constant (T3. 

Note that the assumption of compact Q is only for technical convenience, 
which is unnecessarily strong. Later we will see that for the additive hazards 
model, r] is free of 9. The following is some standard empirical process nota- 
tion that we will use in the rest of the paper. Suppose Xi, . . . ,X„ are i.i.d. 
p-dimensional random variables that follow the distribution P on a measurable 
space {X, A). For a measurable function / : A" H> R, we denote 

1 " f 

i— 1 

71 

Gn/ = n-i/2E{/(X,)-P/} = ni/2(P„-P)/. 

1=1 

Function / can be replaced by a random function x i— >■ /„(a;; Xi, . . . , X„). Thus, 
1 " f 

rnfn = -Y,.fn{X,;Xi,...,Xn), P/„= fn{x; Xi, . . . , Xn)dP{x) , 
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and G„/„=n-i/2V{/„(X,;Xi,...,X„)-P/„} = r?'\^n-P)fn. 



i=l 



3.1. Case-cohort study: the Cox model 

For the Cox model with external time-dependent covariates, we have 

A(i|Z(t)) = Ao(t)e''o^W 
and ^ 







where ^t|Z(t) is the conditional distribution fmiction of T given Z{t), Aq is the 
baseline cumulative hazard function, and 9o is the parameter of interest. We 
define the following random map 



1 

^nie,v) = - y n^{Z^iYi) - TjiY,:9)}A^ , (3.1) 
with true rj given by 



n 
1=1 



_ E{Zit)eO'^WliY>t)} 
W,'^) E{eO'^Wl{Y>t)} ' 

where rii are diagonal weight matrices with subject and covariate specific ran- 
dom weights on the diag that have expectation 1 given complete data Xi = 
[Yi, Ai, Zi{Yi), Z*{Yi)). By choosing a weight matrix, we are allowed to weight 
each component of tp{Xi;9,j]) differently, as in Kulich and Lin (2004). For 
notational simplicity, we consider a scalar weight D,i in the rest of the arti- 
cle. The proofs for a matrix fli are almost identical. It has been shown by 
Andersen and Gill (1982) that Eiji0o,T]oi-;0o)) = E[{Z{Y)-r]Q(Y;eQ)}A] = 0. 
The explicit functional form of rja is unknown and needs to be estimated first 
in order to estimate 9 from (3.1). 

For full-cohort data, fli ~ 1, and the partial likelihood estimating function is 

1 " 

^-y{Z^{Y)-ffAy^■,0)}A., (3.2) 

where i)^ is an estimator of rjo using full data, which has the following form: 

E;u^.-(^y'^-^*^i(>s->t) 



rnit;0) 



^J^^e»'^.(*)l(Yj- > t) 



For case-cohort data where the subcohort is a sub-sample of the entire cohort 
selected with a constant probability tt^ for all i, also with $7, = 1, the pseudo- 
likelihood estimating function of Self and Prentice (1988) is 

1 " 

vl/„(^,7)r) ^-y{Z,{Y)-it''{Yf,9)}A., (3.3) 
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where fjf^^ is an estimator of 770 considered by Self and Prentice (1988) using 
the subcohort data only, which has the following form: 

Here SC denotes the set of subjects in the subcohort. 

In order to improve efficiency, the subcohort can be chosen by stratified sam- 
pling, and furthermore, it is tempting to include failures outside the subcohort 
to estimate 770, see e.g. Kalbfleisch and Lawless (1988). The corresponding es- 
timating function then becomes 

n 

*n(^, - - E ^^iYi){Z^[Y^) - (F,; 0)}A, , (3.4) 

i—1 

where fj^ is a weighted estimator of i]o with the following form 



Here Wi could also be diagonal weight matrices with subject and covariate 
specific random weights on the diag. Again for notational simplicity, we con- 
sider scalar Wi, which may or may not equal to fli. We also require that Wi 
have expectation 1 given complete data Xi = {Yi, Ai, Zi{-), Z* {■)). We con- 
sider a broad class of weighted problems by allowing both weights ft and W 
to be time-dependent. The commonly used weights, originally proposed by 
Kalbfleisch and Lawless (1988), are the inverse-probability weights 

W,^\ + ^{1~A,) , (3.5) 

where tt^ can be time-dependent, see Kulich and Lin (2004) for example. 

Note that the estimating functions in (3.2) and (3.3) can be expressed by us- 
ing counting process stochastic integrals and martingale theory applies in deriv- 
ing asymptotic properties of corresponding estimators, see e.g. Andersen and Gill 
(1982) and Self and Prentice (1988). Using a similar stochastic integral for the 
estimating function (3.4) with weights (3.5), however, creates a measurability 
problem because the integrand is no longer adapted to any meaningful filtration 
(and hence not predictable). See e.g. Chung and Williams (1990) and Protter 
(2004) for detailed discussions on stochastic integration. In this article, instead 
of using stochastic integrals, we give a rigorous proof of asymptotic properties 
of the estimators obtained from the estimating function (3.4) using the general 
Z-estimation theory provided in Section 2. 

It grants great flexibility in estimating 6 from equation (3.4) to use two pos- 
sibly different weights 51; and Wi . When rii = Wi ~ I, the estimating function 
\I'„(0, f)^(-; 0)) reduces to (3.2); that is, the partial likelihood estimating func- 
tion of Cox (1972) for full-cohort data. When fli ^ 1 and Wi = Ri/i^i with 
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constant tt^ = tt > for all i, 5'„(6', ?7^(-; 0)) becomes (3.3); that is, the pseudo- 
likelihood estimating function of Self and Prentice (1988). When Hi = Wi and 
they take the form in (3.5), 'I'„(0, ?7jf (•; 6*)) is equivalent to the weighted esti- 
mating function of Kalbfleisch and Lawless (1988). When fl^ = Wi = Ri/nf, 
here R* is the indicator that equals 1 if subject i has complete data and 
otherwise, and tt* = P(i?,* = l|-'^i); '^n{d,fl^ {■',&)) becomes the estimating 
function proposed by Pugh, Robins, Lipsitz and Harrington (1992), which can 
be derived from a weighted likelihood method for a two-phase design. The cor- 
responding asymptotic properties have been studied by Breslow and Wellner 
(2007) for both independent stratified Bernoulli sampling and finite population 
stratified sampling when covariates are time-independent. To improve efficiency, 
Kulich and Lin (2004) considered the estimating function ^'„(6', ?7^(-; 0)) with 
fij = 1 and Wi being time-dependent weights. A clear advantage of introducing 
weights J7i in ^'„(0, fj^ {■; 0)) is that it allows one to estimate 9 from a data set 
in which some failures may have missing data, e.g. the two-phase design studied 
by Breslow and Wellner (2007). This is more general than a traditional case- 
cohort study which requires all failures to be completely observed. It is obvious 
that all the above weights are nonnegative and bounded, have unit conditional 
expectation given complete data by Assumption (E), and are equal to zero if 
corresponding covariates are missing. We will assume this holds throughout the 
rest of the paper. 

Proposition 3.1. Let fin{t;0) ■ri^'{t;0) as in equation (3.4)- Suppose the 
weight process W{t) has bounded sample paths of bounded variation. Then both 
finit] 6) and rjolt; 0) belong to a Donsker class, and further we have \\fin ^ 'yoll = 
Op.(n-i/2). 

Proof: We consider one nuisance parameter 77 for simplicity. The vector 77 
can be dealt with by examining each of its components. Define 

Di^Ht,0) = P„{w^(i)e''^(*'l(y >t)}, 

d^"Ht,0) = p{w{t)e''''^*h{Y>t)} =p{e''''^*h{Y>t)y, 

and 

Dlp{t,0) = P„{w^(t)Z(t)e«'^«l(y >t)}, 

d'^^\t,0) = p\^W{t)Z{t)e'^'^'^*h(Y >t)} = P^^Z{t)e'''^^'h{Y >t)y 
Then we have 

Apparently the sets of functions J^o = {W{t)l{Y > t)e^'^(*) : <t < t,0 G 
9} and Ti = {W(t)l(Y > t)Z{t)e^' : < t < t, e 9} are well-behaved and 
belong to Donsker classes, see e.g. van der Vaart and Wellner (1996), Section 
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2.10. Hence we have that n^''^{D'^n\t,e) - S''\t,e)] converge weakly to zero 
mean Gaussian processes, and \\d[i'^ — = Op*(n^^/^), fc = 0, 1. Let Fk 

be the closure of Tk, k = 0,1, respectively, in which the convergence is both 
pointwise and in L2{P). Then D^n\t,9) and S-^\t,9) are in the convex hull 
of Jfc, fc = 0, 1, and thus Donsker. See e.g. van der Vaart and Wellner (1996), 
Theorems 2.10.2 and 2.10.3. Hence both {f7„(<;0)} and {??o(^;^)} are Donsker 
by van der Vaart and Wellner (1996), Example 2.10.9, where d'^^ and d'-^^ are 
bounded away (almost surely) from zero by Assumption (C). 

Now we verify that fjn is n^/^-consistcnt by the following calculation: 



,1/2 



d(o)(t,0) 



{D^\t^e)-d(^\t,9)] 

{i?W(t,0)-d(o)(t,0)} 



D[?{t,e)d(^){t,e) 



,1/2 



d(o)(t,0) 



{Z?W(i,0)-dW(t,0)} 



Op.(l). 



Since the classes of functions {W{t)}, {1{Y > t)}, {Z{t)}, and {e'''^^*)} are 
all Donsker, and 770 is a bounded deterministic function, we know that the class 
{W{t){Z{t)-'qo(t; 6')}e^'^(*)l(r > t)} is Donsker (see e.g. van der Vaart and Wellner 
(1996), Section 2.10). We then obtain the desired result. □ 

Proposition 3.2. Assume the conditions in Proposition 3.1 and suppose the 
weight process ^l{t) also has bounded sample paths of bounded variation. Then 
the root of function (3.4) denoted as On is a consistent estimator ofOo. 

Proof: We prove by verifying conditions in Lemma 2.1. The uniqueness 
of 6*0 as a root of ^{9,r]o{-;9)) is proved by Andersen and Gill (1982), here 
'^{d,i]o{-;d)) corresponds to the derivative of the limit of their function (2.7). 
The uniform consistency of fju is given by Proposition 3.1. Now we verify con- 
dition (2.3) by the following argument. Again we consider one-dimensional 
for simplicity. Suppose that fli < K < 00 for all i for a constant K. Let 
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W'H ~ '?o|| < i 0. Then wc have 

|vl/„(0,,y(.;(?))-«'(0,77o(-;0))| 

= ¥,,[n{Y){Z{Y)^j^{Y-e)}^]~P[^l{Y){Z{Y)~i^a{Y-e)}^] 

< p„[0(r)z(r)A] - p[n(Y)z{Y)A] 

+ \PnMY){r,{Y;0) ^ ^]oiY;9)}A] 
+ {Fr,-P)[n{Y)r^o{Y;e)A_ 



The first term on the right hand side of the above inequality converges to zero 
in probability by the weak law of large numbers. The second term 



PnmY){f]{Y;e)-Tjo{Y;d)}A] < P„[r!(y)||77 - 7^o||A] < Kd„ 







uniformly over 0. And the last term converges uniformly to zero in outer prob- 
ability because {r]o{t; 9) : < t < t, 9 € Q} is a Donsker class as we argued in 
the proof of Proposition 3.1, and both {i^{t)} and {A} are also Donsker, thus 
{rt{t)r]o{t;9)A} is Donsker and hence a Glivenko-Cantelli class. □ 

Proposition 3.3. Assume the conditions in Propositions 3.1 and 3.2. Then the 
root of function (3.4) is asymptotically Gaussian, i.e., n^^^{9n — 9o) converges in 
distribution to a zero mean Gaussian random variable with asymptotic variance 
A-^B {A~^y , where 



A = - ^^^{9,r^o{-M 



and 



B = P 



f2(y){Z(y)~7?o(r;0o)}A 



W{t){Z{t) - 7?o(i; 9o)]e''^^^'h{Y > t)dKa{t) 



where a®^ = aa' . 

Proof: Let Ho defined in Lemma 2.2 consist of functions of 770 and = 
fj^ , thus a Donsker class. Obviously the class of functions {ip{9,ri{t;9)) = 
n{t){Z(t) - r]{t;9)}A : 6* 6 Gq, 77 G Ho, < t < r} is a Donsker class that 
satisfies Po\ilj{9,r]) - 7/'(0o,'7o)P ^ as 16* - 6*0! and \\ri - ryoU -)■ by the 
dominated convergence theorem. The Frechet differentiability of {"^{9, rj{-] 9)) : 
9 G 80, ri £ Ho} can be verified easily. Thus from Propositions 3.1, 3.2 and the 
remark following Lemma 2.2 together with Assumption (D), we have all the 
conditions in Lemma 2.2 satisfied and thus equation (2.6) holds. 
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Now we calculate the right hand side of equation (2.6) for the Cox model. 
Interchanging differentiation and integration yields 

ni/'*2(eo,?7o(-;^o))[(?M-r?o)(-;^o)] 

= - ni/2p[r!(y){fy„(r; 0o) - lUY; 0o)}A] 

1 



,1/2 



d(o)(t,(?o) 

,(1) 



,1/2 



-{Dll\t,0o)-S'\t,0o)} 



d(°Ht,9o) 

SdPYMt^S)+Op,{l) 
Wit){Z{t)-r,o{t;9o)}e''^'^'^'hiY>t)} 

x{dW(t,0o)}~'dPy.A(i,l)| +Op.(l). 

The above second equality holds because E{n\X) = 1, and the third equality 
holds because the absolute difference between the two sides except the term 
Op. (1) becomes 



d^'Ht,0o) 



< sup 

t<T 



i'/'{Di'Ht,9„)-d^'Ht,0o)} 
Op.(l)-Op.(l) =Op.(l) 



X sup 



by Proposition 3.1 and tail bounds for the supremum of empirical processes in 
van der Vaart and Wellner (1996), Section 2.14. 

Let G{t\z{t)) be the conditional distribution function of the censoring time 
C at i given Z{t) = z{t), or equivalently given Z{t) = z{t) where t < t, and 
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= E 
= E 



eO'oZ'-*^E{l{Y > t)\Z{t)} 
^e'^z(t)p^j, > t\Z{t))P{C > t\Z{t)) 

e^o^^expJ - f\'''«''^''>dAo{s)\{l - G{t-\z{t))}dHt{z{t)) . 



On the other hand, from the joint distribution of {Y, A, Z{Y)), or equivalently 
of (F, A, Z{t)), we obtain 



d^V,A(^.l) = 



dAo{t) 



x{l ~ Git-\z{t))}dHt{z{t)) 
= dW(t,0o)dAo(O. 

Thus we have 

n'/'*2(^o,?7o(-;^o))[(??„-??o)(-;eo)] 

[wit){Z{t) - 77o(t;0o)}e^°^(*)l(F > t)}dAo{t) 



■(I)- 



It is obvious that ^/i = 0, and by interchanging differentiation and integration 
we have 

*2(0o,?yo(-;^o))[?7o(-;^o)] = -PAf,o{Y;0o) 



P 
d_ 



Then by equaUty (2.6) we have 



(3.6) 



n{Y){Z{Y)-r^o{y;0o)}A 



Wit){Z{t) - 7^oit;0o)}e''«''^'h{Y > t)dAo{t) 



+ Op.(l) 
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which converges in distribution to a zero mean Gaussian random variable by 
the central limit theorem for i.i.d. data. □ 

It is worth noting that equation (3.6) reduces to the asymptotic representa- 
tion of the partial likehhood estimator of Cox (1972) when fi^ = VK; = 1 for aU 
i. It also reduces to the asymptotic representation of Self and Prentice (1988) 
when ~1 and Wi is the inverse selection probability weight of subject i into 
the subcohort, and of Breslow and Wellner (2007) when Vli and Wi are the in- 
verse selection probability weight in a two-phase sampling design. Note that the 
estimators discussed here are generally not semiparametric efficient except the 
case of full-cohort data where Vli = Wi = \ for all i. Finding the most efficient 
estimator is not our focus here. We refer to Nan, Emond, and Wellner (2004) 
for calculations of information bounds and Nan (2004) for an efficient estimator 
when covariates are discrete. 

The above calculation only considers the situation where the weights fi^ 
and Wi are given for each i. It has been shown in the missing data litera- 
ture that using estimated rather than known weights can improve efficiency, 
see e.g. Robins, Rotnitzky, and Zhao (1994), Breslow and Wellner (2007), and 
Li and Nan (2011). In particular, Breslow and Wellner (2007) showed that, 
for the Cox model with time-independent covariates, the weighted estimator 
from a finite population sampling has the same asymptotic distribution as the 
weighted estimator from an i.i.d. Bernoulli sampling with the same selection 
probability but using the estimated weights. The asymptotic variance is smaller 
than that obtained using the true weights for the case of i.i.d. sampling. The 
same property holds for the Cox model with time-dependent covariates and 
time-dependent weights in the case of i.i.d. sampling. The detailed calculation 
follows Breslow and Wellner (2007) and is left to the interested readers. 

3.2. Case-cohort study: the additive hazards model 

Lin and Ying (1994) proposed the additive hazards model in which the hazard 
function given covariate history Z{-) is 



where Aq is the baseline hazard and Oq is the parameter of interest. This model 
allows one to estimate the covariate effect on the absolute risk. Define the fol- 
lowing random map: 



\{t\z{t)) = Xo{t) + e'^z{t), 




(3.7) 



with 



%(0 



E{Z[t)l{Y>t)} 
E{l{Y > t)} 
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where fi^ are defined in the same way as that in the previous subsection for the 
Cox model. Then the estimating function proposed by Lin and Ying (1994) 
can be viewed as the above function (3.7) with VLi = 1 and 770 being estimated 
empirically, which has the following form: 



1 



i=l 



{z,(y,)-C(i^.)}A,: 



- {Z.,{i)--f,^{t)}l{Y,>t)B'Z.,{t)dt 



(3.8) 



with 



jy;=i HY, > t) 



Note that both 7^0 and fj^ do not involve 6. The estimator of 9 has an explicit 
form: 



H -1 



- E / i^'W - €{t)r^m > t)dt - Y,{z,{Y,) - c(>^.)}a..(3.9) 



Lin and Ying (1994) defined the above 'i>n{0,fjn) and 9n using the stochastic 
integral formulation and studied their asymptotic properties using martingale 
theory. 

For case-cohort studies, Kulich and Lin (2000) modified the estimating func- 
tion (3.8) and proposed the following estimating function (with D,i = Wi): 



n ( 



(3.10) 



a,{t){Z,{t) ~ f]n{i)}^{Y^ > t)d'Z,{t) dt 



with 



(3.11) 



The estimator again has an explicit form 
- n 

- E / ^^{t){Z^{t) - fln{t)}Ut)'m > t)dt 
1 " 

x-Y,n,{Y){Z.m-fi^{Y)}A, . 



(3.12) 
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Here we have extended the method of Kuhch and Lin (2000) by introducing two 
weight matrices Q. and W in (3.7) and (3.11), respectively, as in the previous 
subsection. 

When weights Wi or fi^ depend on as in (3.5), for the same reason as that 
in the previous example, martingale theory does not apply. Here we provide a 
proof without using stochastic integrals. As we assumed for the Cox model, ri, 
and Wi are nonncgative with unit conditional expectation given complete data 

We consider the weighted estimating function (3.10) that reduces to (3.8) 
when Vli = Wi = 1 for all i. Without loss of generality, we assume one- 
dimensional covariate Z and thus one-dimensional 9 in the following calculation. 
Multi-dimensional case is a straightforward extension. 

Proposition 3.4. Let f]n[t) = {t) as in equation (3.10). Suppose the weight 
process W{t) has bounded sample paths of bounded variation. Then both f)n{t) 
andrjoit) belong to a Donsker class, and further we /laue ||77„ — 770 1| ~ Op*{nr^/'^). 



Proof: This is a direct consequence of Proposition 3.1 with = 0. 



□ 



Proposition 3.5. Assume the conditions in Proposition 3.4 and suppose the 
weight process fl{t) also has bounded sample paths of bounded variation. Then 
the root of function (3.10) is a consistent estimator of 6q. 

Proof: Similar to the proof of Proposition 3.2, we only need to verify those 
conditions in Lemma 2.1. Obviously ^'(6,770) = -F'{''/'(^i ^0)} is a linear function 
for 6 with a non-zero slope by Assumption (D), hence 0o is the unique solution 
of ^'(0, ryo) = 0. Proposition 3.4 provides the uniform consistency of ?)„. We now 
verify condition (2.3). Let \\r] — 770II i 0. We have 

|*„(0, 7?) -*((?, 770)| 



< 



Um){Z{Y) - 77(r)}A] - P[n{Y){Z{Y) - 7yo(^)}A]| 



n{t){Z{t) - ii{t)}l{Y > t)OZ{t) dt 



- p J n{t){z{t) - r]o{t)}i{Y > t)ez{t) dt 

< |(P„ - P)[n{Y){Z{Y) - Va{Y)}A]\ + \P,,[n{Y)MY) - r,o(F)}A] 
+ (Pn-P) I n{t){Z{t) - r^a{t)]l{Y >t)9Z{t)dt 



+ p„ j n{t){r,{t) - m{t)}iiY > t)ez{t) dt 

< \iF^r.-PmiY){ZiY)-fjo{Y)}A]\ 

(p„ ~p) J n{t){zit) - m{t)}iiY > t)ez{t) dt 
(5„p„|i7(y)A+ / n{t)i{Y >t)\ez{t)\dt 
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in which the first two terms on the right hand side of the last inequality converge 
to zero in probability by the weak law of large numbers, and the third term 
converges to zero because (5„ — > 0. We then have the desired result by Lemma 
2.1. □ 

Proposition 3.6. Assume the conditions in Propositions 3.4 and 3.5. Then 
the root of function (3.10), given in (3.12), is asymptotically Gaussian, i.e., 
n^^'^iPn — ^o) converges in distribution to a zero mean Gaussian random vari- 
able. 

Proof: The proof can proceed either from (3.12) directly or by using Corol- 
lary 2.1. We show the latter. Similar to the proof of Proposition 3.3, the Frechet 
difTerentiability of {^'(0, 77) : 9 £ Qn,rj £ Ho} can be verified easily. Obviously 
the set {n{t)A{Z{t) - ri{t)} : 77 e Ho, < t < r} is Donsker, thus we only 
need to show the class of functions {/J' n{t){Z{t) - r/(t)}l(r > t)eZ(t)dt : 9 e 
&o,ri E Hq} is Donsker, here Jio is reduced from that in the proof of Proposition 
3.3. Let / = J(^n{t){Z{t) - r]{t)}Z(t)l{Y > t)dt and 

m 7n 

/™ = ^ii-){Z{U) - Ti{U)]Z{t,)l{Y > t,){t,+, - t,) - 51 /'^^ ' 



where 



/, = n{t,){Z(t,) - rit,)}Z{t,)l{Y > t,), X, = t. 



i=l 



ti 



and {{ti,t2], . • . , {tm, t]} forms a partition of the interval (0, r]. The set {/"*} 
is the convex hull of J' = {fi}, and thus a Donsker class by Theorem 2.10.3 
in van der Vaart and Wellner (1996) since J- is Donsker. Now we know that 
/™ ^ f both pointwise and in L2{P) by the boundedness of Y and rj, then 
{/(■)} is Donsker by Theorem 2.10.2 in van der Vaart and Wellner (1996). 

We then calculate the right hand side of equation (2.9). Direct calculation 
yields 



n^/^i',j{eo,m){Vn - Vo) = - n^/^P[{7)n{Y) - Vo{Y)}A] 



(3.13) 



{fin{t)~Mt)}HY >t)doZit)dt 



by applying E{n\X) = 1. Let d^°^t) = P{W{t)l{Y > t)} = P{1(Y > t)} 
and d^^\t) = P{W{t)Zit)liY > t)} = P{Z{t)l{Y > t)}, where E{W\X) = 1. 
Similar to the proof of Proposition 3.3, the first term on the right hand side of 
equation (3.13) can be written as 



n'/'P[{UY)-Vo{Y)}A] 
W{t){Z{t) 



Vo(t)}liY >t)d'^^\t)-'dPYMt^) 



■(1) 



W{t){Z{t) - Voit)}l{Y > t)Xo{t)dt 



W{t){Z{t) - Vo{t)}l{Y > t)d„Tjo{t)dt 
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since from the joint distribution of [Y, A, Z{Y)) we have 

^^^jf^ = l{\oit) + eoz{t)}{i^Fit\zm 

{l-G{t-\z{t))}dHt{zit)) 
= Xoit)P{liY >t)}+doP{Z(t)l(Y >t)} 

From the proof of Proposition 3.1 we have 

n'^Hvnit) - m{t)} = d^^Hty^Gn [w{t){Z{t) - Mt)}l{Y > t)] + Op, (1) , 
so the second term on the right hand side of (3.13) can be rewritten as 

"'/'{'7«(0 - mit)}P{HY > t)eoZ{t)]dt 

d'-"\t)-^Gn \w{t){Z{t)-7^o{t)}l{Y > t)\9oS^^t)dt + Op.{l) 

+ Op.(l). 



W{t){Z{t) - i^^{t)]\{Y > t)eovoit) dt 
Thus from (2.9) we obtain 



n — Co 



p 



n{t){z{t) - T]o{t)}i{Y > t)z{t) dt 



n{Y){Z{Y) - 77o(r)}A 



(3.14) 



{n(t)eoZ{t) + w{t)\o{t)}{z{t) - m{t)}i{Y > t) dt 

+ Op.(l), 

which is asymptotic normal by the central limit theorem. This asymptotic rep- 
resentation reduces to that in Kulich and Lin (2000) when Q,i = Wi. Again, we 
do not require fli and Wi to be predictable. 

4. Discussion 

We consider i.i.d. sampling for the case-cohort studies. Breslow and Wellner 
(2007) have considered finite population stratified sampling and applied the ex- 
changcably weighted bootstrap empirical process theory of Pra^stgaard and Wellner 
(1993) for the Cox model with time-independent covariates. The general Z- 
estimation theory in Section 2 is likely to be applicable to the finite population 
stratified sampling designs for time-dependent covariates. 

The theory in Section 2 requires smooth rj with respect to 0, which is mainly 
restricted by the smoothness condition (2.4) or (2.5). For non-smooth rj, for ex- 
ample, the rank-based estimating function for the accelerated failure time model. 
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the smoothness condition does not hold. Nan, Kalbfleisch, and Yu (2009) have 
showed that a similar idea for bundled parameters with missing data is applica- 
ble to the rank-based estimator for the accelerated failure time model. For mod- 
els with bundled parameters in the original parameterization, Ding and Nan 
(2011) have proposed a sieve maximum likelihood estimating method and ap- 
plied the method to the efficient estimation of the accelerated failure time model. 

We have discussed two examples, the proportional hazards model and the 
additive hazards model in case-cohort studies, though our method applies to 
a much broader range of semiparametric estimation problems. The parameter 
estimation in the case-cohort studies is hard to handle by traditional martingale 
based methods when certain more efficient but unpredictable weights are con- 
sidered, but becomes straightforward by using the general pseudo Z-estimation 
theory. 

Another point worth mentioning is that for missing data problems, the es- 
timated likelihood method of Pepe and Fleming (1991), the mean score method 
of Reilly and Pepe (1995), and the pseudoscore method of Chatterjee, Chen, and Breslow 
(2003), among others, also fit into the general Z-estimation framework nicely. 
Let Y be the response variable and (Z, V) be covariates where Z can be miss- 
ing sometimes. Let R be the indicator that takes value 1 if Z is observed and 
otherwise. Let X denote the observed data. Suppose that the parameter of 
interest € Q <ZW^ could be estimated by using the complete data score func- 
tion Vg{-]9) as the estimating function if there were no missing data. When Z 
is sometimes missing at random (Little and Rubin , 2002), then the observed 
data score function for becomes 

ie{X- 0, 77o(-; 0)) = i?/^(r, Z, V; 9) + {I - R)vo{Y, V- 9) , 

where rio{Y,V;0) — E{lg{Y, Z,V;9)\Y,V} whose functional form is unknown. 
Define ifj{-;0,r]{-;0)) = lg{-;9,T]{-;9)). Then ■(/;(•; 6*, 7)„(-; 0)) becomes an estimat- 
ing function for 9 where •qn{-',0)) is an estimator of rio{']9). The asymptotic 
properties of the Z-estimator for depend on the behavior of f}„ and may be 
derived from the theorems given in Section 2. Authors of aforementioned ref- 
erences have proposed nonparametric methods to estimate rjo{-;0). Apparently 
efficiency can be improved by using the weighted estimating function proposed 
by Robins, Rotnitzky, and Zhao (1994). The proposed methodology may also 
apply to the composite likelihoods for semiparametric models, see e.g. Lindsay 
(1987) and Varin, Reid and Firth (2011), particularly for missing data prob- 
lems. 



Appendix: Proofs of Lemmas 2.1 and 2.2 
Proof of Lemma 2. 1 

Since 6*0 is the unique solution to '^{0,rjQ{-\0)) = 0, this implies that for any 
fixed e > 0, there exists a (5 > such that 



9ol >e 



< P 



\^{L.Tlo{-,9n))\>5 
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If we can prove |^'(^n, Jyols ^n))l ^p* then the consistency of On will follow 
immediately. 

To do this, first note that since \\f)n — ?7o|| = Op*(l), there exists a sequence 
{5n} \r such that | ~ ?7o 1 1 < with probability tending to one. Hence taking 
•q = i)n in equation (2.3), we have the following inequalities: 

|*(^„,??0(-;^n))l < |*n(4, ??„(•; + \^ {k, k)) ~ n{On, f]n{-] k))\ 

< |*«(^„,77n(-;^n))l +Op*(l+ |*„(^„,77n(-;^rO)| 

< Op.(i) + Op. (i + op. (i) + |«'(0„,7?o(-;^«))|j , 

which implies \^{9n,Vo{'',(^n))\ = Op*(l)- So we have proved the consistency of 
pseudo Z-estimators 0„. □ 



Proof of Lemma 2.2 

We first show a result that we will use in the proof: under Conditions (i) and 
(ii), 



,1/2 



(4.1) 



By Condition (i), we have the following inequality: 



,1/2 



(«-„ - *)(0„,7)„(-;0„)) - (*„ - *)(0o,%(-;eo)) 



Op. (1) + Op. ^^(^^^^ f^^^(.; ^^J) 



+ Op. I 77. 



,1/2 



*(0«,r)„(-;0„)) 



By the triangle inequality — |a| + |5| — |c| < \a — b — c\ and the fact that 

*(0o,%(-;eo)) = o. 



,1/2 



*(^„,f/n(-;^n)) 



,1/2 



< 7.1/2 



(*„ - *)(0„,7?„(.;0„)) - (*„ - *)(0o,%(-;^o)) 



Op.(l) + Op. (771/2 ^„(6)„,,-y„(.;6i„)) 



+ Op. I 77, 



1/2 



*(^„,?7n(-;0n)) 



which implies 



,1/2 



*(^n, ??„(•; ^n)) [l-Op.(l)] 

< Op. (l)+77i/2|«'„(4,77„(-; 4)) [l + Op.(l)] 

+77i/2|*„(0o,%(-;eo))| 

= 0p.(l)+0p.(l) + 0p.(l). 
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Hence (4.1) holds. 

We then show the root-n consistency of 0„. Since \9n — 6q\ = Op*{l) and 
\\fln ~ ??o|| = Op-{n~^) with /3 > 0, there exists a sequence {(5„} 4- and c > 
such that — ^ol and ||^n — ?7o|| < cn~f^ with probabiUty approaching 

one. Hence taking (0, r/) = {dn,fln) in the smoothness condition (2.5): 

i'/^ {*(0"„, ?/„(•; &„)) - *(0o, mi-, Oo))} 

- {*l(^^o,^?o(•;^o)) + *2(^^o,??o(-;eo))[?7o(-;^o)]} (On-Oa) 

- n'/H2{0o, %(•; eo))[(r)„ - Vo){-; Oo)] 
= Op. (n^/^K -Oo\)+ Op, (ni/2||7),„ - yyoir) 



Op, (l + ni/2|0„-0o|) 



(4.2) 



since n^/^Op- (| |?7„ — ??o|r) = Op»(l) by a/3 > 1/2. Same result can be obtained 
by using the smoothness condition (2.4) for /3 = 1/2. By equation (4.1), the fact 
that ^'(^0) %(■; ^o)) = 0, and the triangle inequality ~\a\ + \b\ — \c\<\a — b~ c|, 
equation (4.2) implies 

-Op.(i) + |ni/2 {*i(0o,?7o(-;eo)) + *2(0o,??o(-;eo))[%(-;^o)]} ik - ^o) 
- ^1/2^2(^0, ^o(-; OoMfin - %)(•; ^o)] 

< Op. [l + n^^^\en~eo\y (4.3) 

Since the dxd matrix \E'i(6'o, ?7o(s ^'o)) + ^2(6'o, '7o('; ^o))['?o('; ^'o)] is nonsingular, 
there exist a constant ci > such that 

1 1*1(00, ^o(-; ^o)) + ^2(00, Oo))M-; ^o)]} [o -Oo)\> ci \9 - eo\ 

for \9 — 9q\ — >■ 0. On the other hand, by Condition (iv), combination with 
inequality (4.3) yields 



Op.(l) > 



- ^1/2^2(^0, ^o(-; doMfin - %)(■; Oo)] 



%) 



Or,. 1 + n 



1/2 



On — Oo 



> Cin 



1/2 



-Op.{l)-Op. 1 + 



{Op.{l)-Op.{l)]n^/' 



'n — Co 



,1/2 



-Op. {I). 



- PO 



) 



Hence the sequence n 



1/2 



'n — Oo 



must be bounded in outer probability. 
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Now we are ready to prove equation (2.6). Because 
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,1/2 



*(e„,^n(-;^n))-*(^0,??0(-;^0)) 



,1/2 



«'(6'„,77n(-;6'n)) - I'n(6'„,7?„(-;6I„)) 
= nl/2(* - «'„)(e"n, ?7n(-; + Op* (1) - 

* , ( • ; ^« ) ) 1 (by Condition (i) ) 



+ ni/2 

= - ni/2(^'„-*)(0o,^o(-;^o))±Op.(l) (by equation (4.1)), (4.4) 

after replacing equation (4.4) into the first term in the first line of equation (4.2) 
we obtain 

ni/2(*„ - *)(0o, mi-, 0o)) ± Op. (1) - ni/2 {^-1(00, ?7o(-; ^o)) 
+ ^2{Oo,Vo{-,Oo))M-;0o)]} [On - Oo) 

^ ni/2*2(0o,^o(-;eo))[('?n-%)(-;M 



Op. (l + ni/2 

Op.(l), 



'ri — Oq 



) 



which implies 

n'/HOn-Oa) = { -*i(eo,%(-;^o))-*2(eo,%(-;eo))b?o(-;eo)]}"' 

X ni/2{(*„-*)(0o,r7o(-;eo)) 

+ *2(^0, %(•; eo))[(^?n - %)(•; ^O)]} + Op. (1) . □ 
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